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@ Method and apparatus for memory routing scheme. 

© A memory routing scheme for a computer system having M processors (PO ... P5) and N memories is 
described. M processors are coupled through a randomizer (23) to a routing network (10), such as a crossbar. 
The crossbar is coupled to N memories (MEM 0 ... 5). When a memory address is specified by a processor, it is 
acted on by the randomizer and a routed address is given to the memory. The memory havmg the routed 
address Is coupled to the processor for the access. By utilizing a random routing scheme, the memories are not 
optimized for any one particular access mode, but present the same look to the processors regardless of the 
access mode The average number of collisions In this scheme is a function of the number of memories, number 
of processors and number of access ports. In one embodiment, a hashing table is utilized for the assignment of 
the routing address to the memories. Alternatively, a randomizing function is utilized to generate a routing 
address. 
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METHOD AND APPARATUS FOR MEMORY ROUTING SCHEME 

BACKGROUND OF THE INVENTION 

5 1. Held of the Invention 

This invention relates to the field of n^emory systems and routing schemes for reading and writing data. 

10 2. Background Art 

A number of computer applications require the use of a plurality of processors to achieve high speed 
processing power. This is particularly true in data intensive applications such as computer graphics^ One 
example of a prior art multiple processor system is the single instruction, multiple data stream (SIMD) 

,s computer described in co-pending United States Patent Application number 175.621 and assigned to the 
assignee of the present invention. 

One specific application of multiple processor computer systems is found in the rendering of three 
dimensional image volumes. Volume image data representing points in three dimensional space .s stored m 
a memory array. The data generally represents rows (x axis), columns (y axis) and shafts (z ax.s). The 

20 manner in which data is acquired from a memory system is referred to as an "access mode ^ For ex^ple^ 
in certain volume imaging applications, data is accessed as linearly adjacent in the x. y or z directions or n 
planes xy xz or yz. When the access mode is known, the memory routing scheme can be optimized to 
reduce or 'eliminate the number of collisions between the plurality of processors and the memory units^ 
Mass storage of a computer system is typically defined as an array of smaller memory units. These 

25 smaller memory units may be actual physical memories themselves, such as a plurality of random access 
memories (RAM) erasable programmable read only memories (EPROM's). electronically erasable read only 
memories (EEPROM'S) etc.. or an electronic sub-division of a single large storage unit. The mass storage 
of a computer system may therefore be thought of as a series of linearly adjacent memones or as a series 

°* "one°'irS"trt memory scheme is referred to as a "non-common memory" parallel processing 
architecture In this scheme, each processor has its own associated memory which cannot be accessed by 
another processor. This prevents collisions between processors. However, this scheme is useless in image 
processing applications in which two passes through a data set are required, (one in the vertcal orientation 
and one in the horizontal orientation). • 

Another prior art memory routing scheme uses a single bus coupling the processors to one or more 
memories Such a scheme guarantees collisions and prevents the use of more than a single processor at 
one time It is possible to achieve sufficient bandwidth by providing a large data block size. However, high 
block sizes give rise to high latency times and low performance when operating on small blocks of data. 

Another prior art scheme is the use of a dual port RAM so that two processors can access a single 
RAM, provided both processors do not attempt to access the same address in the memory. However the 
Drior art dual port scheme is limited because in present technology, only dual port RAMs are available 
Therefore, only two processors are supported when it may be desired to utilize a large number of 

''°''S:Zi:'SZTZTZ.,^e processor computer system connected via a switching network 
having M processors and N memories. If N g M. it is possible for each processor to access a d'«ejent 
memory during a memory operation. When two or more processors are attempting to access the same 
memo^ a "collision" takes place and a method of determining which processor will access the memory in 
which o;der must be provided. When the predominant or preferred access mode Is known the memory 
rssignmenT scheme or logical to physical address mapping scheme can be optimK^ed to reduce or 
e ZaTe collisions between processors. However, if the access mode is not known, or if the access mode 
cSes in an unpredictable manner, it is difficult to optimize the memory assignment scheme or logical to 
Dhvsical address mapping scheme to achieve maximum performance. „x.,,^h 
Ano her prior art memory routing scheme is described in co-pending U. S. Patent Appl.cat.on entitled 
METHOD AnS APPARATUS FOR ItORAGE AND ACCESS OF THREE DIMENSIONAL DATA ARRAYS 
Son October 13. 1988. Serial Number 257.936 and assigned to the assignee of the present invention. 



30 



35 



EP 0 373 299 A2 



70 



IS 



This scheme involves a method In which blocks of data, each representing an n x n x n cube of elements, 
are stored in n memories, and n elements may be simurtaneousiy accessed adjacent in the X. Y or Z 
directivns. Memory elements are written and accessed by using a rotation scheme optimized for row. 

column and shaft access modes. ^ u u 

One problem with prior art memory routing schemes is their dependence on access mode. If the 
access mode is not known, or if it deviates from the predicted access mode, an unacceptable number of 
collisions may take place. 

Therefore, it is an object of the present invention to provide a memory routing scheme which is access 
mode independent and in which collisions are minimized to some acceptable low level. 

It is another object of the present invention to provide a memory routing scheme in which random 
access modes may be utilized without degrading system performance. 

Other objects and attendant advantages of the present invention will become apparent upon reading the 
following detailed description of the invention along with the accompanying drawings in which like reference 
numerals refer to like parts throughout 

I 

SUMMARY OF THE PRESENT INVENTION 
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A memory routing scheme for a computer system having M processors and N memories is described. 
In the present invention, the memory routing scheme is access mode independent so that the average 
number of collisions is minimized regardless of the access mode. In the prefen-ed embodiment, M 
processors are coupled through an address randomizer to a routing network, such as a crossbar. The 
crossbar Is coupled to N memories. When a memory address is specified by a processor, it is acted on by 
the randomizer, routed by the cross bar and a routed address is given to the memory. The memory having 
the routed address is coupled to the processor for the access. By utilizing a random routing scheme, the 
memories are not. optimized for any one particular access mode but present the same look to the 
processors regardless of the access mode. 

In the preferred embodiment of the present invention, each processor has one port into the routing 
network In cases where higher bandwidth is required, additional ports may be used. The average number 
of collisions in the scheme of the present invention is a function of the number of memories, and- number of 
access ports. In one embodiment, a hashing table is utilized for the assignment of the routing address to 
the cross bar. 

In the present invention each processor/randomizer combination produces memory requests at random 
and independent addresses. In the preferred embodiment, a hash table is implemented by repeated 
application of an individual look up table. In the present invention the individual look up table has a one to 
one mapping of input to output addresses so that the complete hash table has a one to one mapping of 
loqical addresses and physical memory locations. In one embodiment, for distribution of bit positions, a fully 
programmable hash table is utilized comprising repeated application of a ROM and permutation. When 
collisions do occur, the present invention may assign a rotating priority scale which increments after each 
memory cycle. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1A is a block diagram illustrating a prior art n x n array of data elements. 
Figure 1 B is a block diagram illustrating another prior art data array. 
Figure 2 is a block diagram illustrating the preferred embodiment of the present invention. 
Figure 3 is a block diagram illustrating a ROM and permutation network. 

Figure 4 is, a block diagram illustrating a hardware implementation of a 16 bit ROM programmable 
hashing table 

DETAILED DESCRIPTION OF THE PRESENT INVENTION 

A memory routing scheme to minimize collisions among processors in a multiple processor system is 
described. In the following description, numerous specific details, such as access orientation, etc., are set 



3 



EP 0 373 299 A2 



forth to provide a more thorough description of the present invention. It will be apparent, however, to one 
skilled in the art, that the present invention may be practiced without these specific details. In other 
instances, well known features have not been described in detail in order not to obscure the present 
invention. 



PRIOR ART 

Referring to Figure 1A. a prior art memory scheme is illustrated. In the example shown, four processors 
P2, P3 and P4 are coupled to four memories 11-14 through routing circuit 10. The routing circuit 10 
may be a cross bar where any input line 15-18 of the processors P1-P4 may be coupled to any of the 
crossbar output lines 19-22. Thus, each of the processors can access any of the four memories during any 
one memory access. 

In the present example, the processors may be part of a graphic imaging system in which three 
75 dimensional data is represented by volume elements (voxels) identified by their characteristic information 
expressed as color components and opacity. The color components are given by red (R). green (G) and 
blue (B) values while the opacity is given by an alpha (A) value. Such a system is described in copending 
U. S. Patent Application number 851,776 entitled METHOD AND APPARATUS FOR IMAGING VOLUME 
DATA and assigned to the assignee of the present invention. 
20 The memories 11-14 are used to store the components of the voxels. For example, memory 11 
contains the red values RO-R(n). Memories 12-14 contain the green. GO-G(n); blue. BO-B(n); and opacity. 
AO-A(n) respectively. This scheme is designed to optimize a memory access in which the four components 
of a single voxel are obtained. However, if it is necessary or desirable to access one component of four 
different voxels, such as. for example, four red values for voxels 0-3, the scheme illustrated in Figure 1A 
26 requires four memory cycles. This is because only one of processors P1-P4 can access memory 11 at a 
time. When all four attempt to access memory 11, a collision results and access must be limited to one 
processor at a time. 

One prior art attempt to solve the collision problem outlined above is the use of a rotation scheme when 
entering and retrieving data from the memories 11-14. This prior art scheme is illustrated in Figure IB. As 

30 successive component values for each voxel are provided to the memories 11-14. the components are 
rotated by the routing circuit so that no single memory contains all components of a single type. That is, no 
single memory contains all red components for. example. The components RO. GO. BO and AO of voxel 0 
are found in memories 11-14 respectively. For voxel 1. components R1, G1 and B1 are found in memories 
12-14 respectively and component A1 is found in memory 11. Thus, in the scheme of Figure IB. each 

35 successive voxel is "shifted" by one so that the memory containing the red component of one voxel is 
different from the memory containing the red component of the previous voxel and next successive voxel. 

By using the prior art scheme of Figure IB. the red components, for example, of four successive voxels 
may be accessed without collisions by the four processors P1-P4. Similarly, all four components of any 
single voxel may be simultaneously accessed without collisions by the processors P1-P4. 

40 The prior art scheme of Rgure IB provides one solution to the access limitations of the scheme of 
Figure 1 A. However, the scheme does not work well for three dimensional data bases such as may be used 
in a three dimensional graphics or imaging application. For example, although the scheme of Figure IB 
permits the simultaneous access of rows and columns, it does not allow the simultaneous access of 
"shafts" in a three dimensional data set. Furthermore, the prior art schemes of Figures 1A and 1B are 

45 designed to be optimal for a limited number of access modes. 

When these prior art schemes are used in other than limited access mode contexts, such as 
randomized access modes, the number of collisions increases dramatically. 

For example, if a user desired to access the red components of voxels 0. 4. 8. and 12. the scheme of 
Figure IB results in four collisions because each of those voxels has its red component stored in memory 

50 11. Although it may be possible to provide a memory routing scheme which is optimized for that particular 
access mode, such a scheme would not be optimized for other access modes. In many applications, the 
access mode changes frequently and cannot be predicted beforehand. 

Collision analysis for prior art schemes is illustrated in the following example, in which an equal number 
N of memories and processors are connected by an NxN crossbar so that each processor can be coupled 

55 to each memory and vice versa. The maximum performance of such a system occurs when each processor 
accesses a different one of the N memories. For example, when processors 0-N access memories 0-N 
respectively, in this situation, there are no collisions between processors and 100% throughput is achieved. 
However, the throughput of this prior art system drops to a minimum level when each processor 
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attempts to access the same memory. For example, when processors 0-N all attempt to access memory 0. 
the crossbar allows only a single processor to access a single memory at a time. Thus, the throughput 
drops to 1/N of the maximum. 

In the prior art, a memory routing scheme is optimized for a particular access mode. For example, in 
5 image processing, the memory addresses generally follow a linear sequence, such as a, a + b. a + 2b. a + 3b 
etc. If sequential addresses were placed in sequential memories, and all processors were accessing 
sequential addresses, the throughput would become 100% within N memory cycles. This may be seen in 
Table 1 below where 8 processors accessing linear incremental addresses are connected to 8 memories 
using the addressing scheme: memory = (address modulo 8). 

10 

TABLE 1 
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P2 


P3 
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P6 


P7 
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M2 


M3 


M4 


M5 


M6 
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The prior art scheme described with respect to Table 1 requires knowledge of the memory address 
increment rate b. This value may change depending on the code being run and can take on different 
values, particularly In two pass algorithms. The tesselation scheme of Figure 1B can be made to work for 
one or two different values of b. but is restricted for random values and random access modes. 



35 



PRESENT INVENTION 



40 



The present invention utilizes a scheme in which the processors produce memory requests at random 
and independent addresses. In other words, the addressing scheme, does not follow a linear sequence as in 
the above described example. The preferred embodiment of the present invention results in a routing 
scheme which has a constant throughput regardless of the access mode. The sustained throughput rate is a 
percentage of 63% of the theoretical maximum for N = M. 

The preferred embodiment of the present invention is illustrated in Figure 2. A plurality of processors 
P0-P5 are coupled to a routing circuit 10 through a plurality of randomizers 23. The outputs of the routing 
circuit 10 are coupled to memories MEM 0 - MEM 5. The randomizers 23 may be implemented with a look 
up table. The randomizers 23 are not truly random, but require a one to one correspondence between the 
input and output. If there are four inputs to a randomizer for example, then there must be four outputs. By 
randomizing the storage of the input data, the average number of collisions between processors for a 
random access mode is minimized. • 
In operation, the memory address is passed through the randomizer and a random memory address is 
determined based on the results of a look up table. The data is then stored in the random memory address 
^ location. When retrieving data, the opposite operation takes place so that the con-ect data may be 
accessed 

The following Table 2 illustrates the operation of the present invention for 8 processors and 8 
memories. The address generation is randomized. The collisions are indicated in bold. 

55 
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TABLE 2 
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P6 


.P7 
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48 


37 


63 


30 


43 
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30 
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In the example shown, the number of memory cycles is 64, There are 39 successful memory accesses 
(no collisions) and 25 collisions. This results in a throughput rate of approximately 61%. For a large number 
of cycles, the present invention results in an average throughput rate of 63% of maximum. This compares 
to a variable throughput rate of anywhere between 0 and 100% for prior art routing schemes. 

Let m be the number of memory channels (banks) and let r be the number of simultaneous memory 
requests from processors, video, I/O. etc. It is desired to calculate requests that will be blocked because of 
bank collision, and also the standard deviation from this expected value. The assumption is that each 
request Is made to a random bank of memory. 

Let gr be the number of memory requests that are granted. If there are no requests, none can be • • 
granted; so go = 0. Each additional request will fall if it is for one of the gr-1 banks for the requests which 
have already been granted and will succeed if it is for one of the other m-gr-1 banks. The chance that this 
request will be granted is thus (m-gr)/m. So the expected value of gr is given by: 



= g,.t + l m ) 



From this, we can build up a table of effective throughput rates: 



45 



50 
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m = 1 


m=2 


m = 3 


m = 4 


m = 5 


nn = 6 


m=7 


m = 8 


...m = 256 
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1.00 


1.00 


1.00 


1.00 


1.00 


1.00 


1.00 


1.00 
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.50 


.75 


.83 


.88 


.90 


.92 


.93 


.94 
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.33 


.58 


.70 


.77 


.81 


.84 


.86 


.88 
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.25 


.47 


.60 


.68 


.74 


.78 


.81 


.83 
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.20 


.39 


.52 


.61 


.67 


.72 


.75 


.78 
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.17 


.33 


.46 


.55 


.61 


.67 


.70 


.73 




7 


.14 


.28 


.40 


.50 


.56 


.62 


.66 


.69 




8 


.13 


.25 


.36 


.45 


.52 


.58 


.62 


.66 




256 


















...0.63 



As can be seen from the above table, for large numbers of memories, the throughput rate can be 
55 maintained at an acceptable level (63%) even when the number of memory requests r is equal to the 
number of memories m. For example, when are there 256 memories and 256 memory requests during each 
memory cycle, the throughput is maintained at 63%. This is achieved due to the random nature of the 
memory access. 



6 



EP 0 373 299 A2 



10 



75 



20 



25 



in addition to the randomization of the memory access addresses, throughput at the processor leve is 
affected by the method in which the crossbar assigns priority in the case of collisions. For example, if a 
convention is used in which a lower numbered processor is given priority, the lower numbered processors 
will have a higher throughput than the higher numbered processors. In the preferred embodiment of the 
present invention, a rotating priority scale is utilized. The priority scale is incremented after each memory 

""^""fn an alternate embodiment of the present invention, a priority is assigned to each processor based ori 
the length of time each processor has been waiting. In the case of two or more processors having equal 
wait times, a rotating priority scale is utilized. 

IMPLEMENTATION OF RANDOMIZATION 

The randomization of the memory addresses in the preferred embodiment of the present indention is 
accomplished through the use of a look up table. The table must have a one to one correspondence 
b^Seen t^floglca 'addresses and the physical memory addresses. In addition, the table must prov.de a 
somewhat random assignment function. For a small hash table, a ROM could be used. However, for many 
applications, the use of a ROM hash table may be impractical. »„„,i^atinn 
in one embodiment of the present invention, a large hash table is .mplemented by repealed appl'ca ton 
Of a smaller look up table. The smaller look up table must also have a one to one =°^«^P°";*^"':%^ j^^* 
the complete hash table also has a one to one correspondence. Referring to Figure 4. a hardware solution 

of such a table is illustrated. * t., on -r^c^ro ic a 

This embodiment of the present invention utilizes a plurality of 16x4 look up tables 20-36 There is a 
one to one mapping of the smaller look up tables to each other so that the total hash table '"^P^^^^^d by 
the smaller look up tables has a one to one conrespondence as well. As can be seen each small look up 
table 25-36 has four inputs and four outputs. The entire hash table has 16 inputs and ^ 6 outputs^ 

in the present invention the input to the hash table is comprised of inputs AI-DI of look up tables 25-28 
respec ively The outputs 25 AO-DO are coupled to the A inputs of look up tables 29-32 respectively That 
[s. output 25 AO is coupled input 29 Al. output 25 BO is coupled to input 30 A1 . output 25 CO is coupled to 
30 input 31 Al and output DO is coupled to input 32 Al. ^ . » „hi»e oa -^o 

The second input look up table 26 has its outputs AO-DO coupled to the B inpLrts of tables 29 32 
respectively. Look up table 27 has its outputs AO-DO coupled to to the 01 inputs of tables 29-32 and input 
table 28 has its outputs AO-DO coupled to the D inputs of tables 29-32 respectively. 

The inputs Al-DI of look up table 33 are coupled to the A outputs of look up tables 29-32 respectively. 
OS ThatTs input 33 A. is coupled to output AO of look up table 29. input 33 Bl is coupled to output AO U.ok up 
tabte 30. input 33 CI is coupled to output AO of look up table 31 and input 33 Dl is coupled to output AO of 

'°°'ThV inputs' AI-DI of look up table 34 are coupled to help the B outputs of look up tables 29-32 
respectively. The inputs Al-DI of look up table 35 are coupled to the C outputs of look up tables 29-32 
respectively and the inputs Al-DI of look up table 36 are coupled to the D outputs of look up tables 29-32 

'^"'^Thf distribution of bit positions where the look up table is applied should avoid patterns in the 
randomizing function, which could lower throughput in the scheme of the present 'nvention^ 

Figure 3 shows a circuit for a 32 bit randomizer which executes over a period cIck* t^^^^ 
randomizer works by repeatedly applying a 6 bit ROM implemented randomizing function, followed by a 32 
bit permutation Gumbling of bit positions). A one to one mapping property P^eserved^ ^oresents 
A 32 bit number stored in register 43 is separated into two outputs 44 and 47. Output 44 represents 
some fixed nuX of S of register 43. for example. 6 bits. This 6 bit output 44 is coupled to 

fR^Om^ k^pt:; is T^e ^ outpSs Of ROM 45 is equal to the tir/orthforuroi 

if ROM 45 is a 6 bit input ROM. it has a 6 bit output 46 as well. The remaining bits 47 of the output of 
redster 43 are passed straight through and combined with the output 46 of ROM 45 to form output 49. 
Out^t 49 repLe^ functton f(x). "Biis function is a randomized version of the output of register 43 due 
to thTactorof ROM 45 on a certain number of bits of the output. This function f(x) 49 is then coupled to 
oermuS^n netTorrra where a bit permutation is applied to the function f(x). This bit pemnutation is 
f'u n"^ g(x) K^^e input to permutation network 48 is 32 bits, the output is 32 bits, so there is a one to one 
mapp'g S inpit to output values. The output 50 of permutation network ^/^^ 

corpbination of the functions f and g. acting on input value x. In the preferred embodiment of the present 
invention, this application of functions f and g is perfomned 16 times so that: 
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x' = gfgfgfgigfgfgfgfgfgfgfgfgfgfg^gf(x) 

The output 50 of the permutation network 48 is coupled in a feedback fashion to multiplexor 41. Also 
input to multiplexor 41 is the input value x. The output 42 of multiplexor 41 is inputted to register 43. 

Since both g and f have one to one mappings, the complete transfer function of the circuit must also 
have a one to one mapping. 

The degree of randomness of the function is dependent both on the 6 bit mapping function f(x). and on 
the permutation function g(x). Good results have been shown simply by using a computer generated set of 
non-repeating random numbers, both as values for the 6 bit lookup table, and for bit mapping values for the 
permutation network. 

The circuit shown in Figure 3 may be modified in the general case where the address range is of the 
form 2n. in this case, the data path is n bits wide, with 6 bits going to the ROM and n-6 bits being passed 
straight through to the n bit permutation network. It is recommended that the number of clock cycles 
needed to execute the transfer function be proportional to the number of bits. 

An additional modification of the circuit shown In figure 3 provides a transformation function for address 
ranges' of the form 3 x 2n. 5 x 2n. and 7 x 2n. In this case the ROM contains either a 40,48 or 52 size 
random number table. The permutation network has 'the added constraint that the upper 3 bits are not 
allowed to have any permutation applied to them. 

In the preferred embodiment of the present invention, the contents of the ROM look up table 45 Is: 



00: 30 


10: 17 


20: 2A 


30: 15 


01: 25 


11: 10 


21: 01 


31: IC 


02: 3F 


12: 14 


22: 22 


32: OD 


03: IE 


13: 16 


23: 21 


33: 04 


04: 2B 


14: 20 


24: 2E 


34: 2C 


05: 06 


15: 11 


25: 3A 


35: OF 


06: 27 


16: ID 
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The permutation algorithm for the permutation network 48 is as follows in the preferred embodiment of 
the present invention: 
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The effect of applying functions f and gf at selected stages is illustrated as follows: 
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Thus, an improved method and apparatus for a memory routing scheme in a parallel processor 
architecture is described. 
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Claims 
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1. A method for addressing a plurality of memory locations in a multiple processing environment 
comprising the steps of: 

generating a memory address at each of a plurality of processors; 
providing said memory address to a randomizing means, 
converting said memory address to a random address; 
oroviding said random address to a routing means 

coupling said each processor to a memory corresponding to said random address through said routing 
means. 

2. The method of claim 1 wherein said routing means compnses a crossbar. 

3 The method of claim 1 wherein said randomizing means comprises a hash table. 

4* The method ot claim 3 wherein said hash table comprises a plurality of look up tables, each of said 
look up tables having an equal number of inputs and outputs, said hash table having an equal number of 
inputs and outputs. 

5. A circuit comprising: 

a plurality of processors (PO ... P5) 

a plurality of randomizing means (23). each randomizing means coupled to one of said plurality of sa.d 

TprraliW^ memories (MEM 0 ... MEM 5) coupled to said plurality of randomizing means through a routing 
circuit (10) such that each of said plurality of memories may be coupled to any of said plurality of 

randomizing means; , . . . . ^ 

each of said processors providing a memory address to each of said randomizing moans, sa.d randomizing 

means (23) converting said memory address to a random address: 

said randomizing means (23) providing said random address to said ~"«"9 rneans (^0) ^^^^J^^^^^^^^^ °* 
said processors (PO ... PS) is coupled to one of said plurality of memories (MEM 0 ... 5) corresponding to 
said random address through said routing means. 

6. The circuit of claim 5 v^^herein said routing means (10) comprises a crossbar circuit 

7 The circuit of claim 5 or 6 wherein said randomizing means (10) comprises a hash table (Fig. 4). 

Q The circuit of claim 7 wherein said hash table (Fig. 4) comprises a plurality of look up tables, a firs 
plurality of look up tables (25 ... 28) is coupled to said processing means (PO ... P5). said irst plurality of 
loo arables haxJng a first plurality of outputs (AO ... DO), said first plurality of outputs coupled to a second 
plurality of look up Tables (29 ... 32). said second plurality of look up tables having a second ph.^^''^ ^; 
outputs (AO ... DO), said second plurality of outputs coupled to a third plurality of look up tables (33 ... 36). 

g 
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said third plurality of look up tables having a third plurality of outputs coupled to said routing means (10). 
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<g) Method and apparatus for memory routing scheme. 



© A memory routing scheme for a computer sys- 
tem having M processors (PC ... P5) and N memo- 
ries is described. M processors are coupled through 
a randomizer (23) to a routing network (10). such as 
a crossbar. The crossbar is coupled to N memories 
(MEM 0 ... 5). When a memory address is specified 
by a processor, it is acted on by the randomizer and 
a routed address is given to the memory. The mem- 
ory having the routed address is coupled to the 
processor for the access. By utilizing a random 



routing scheme, the memories are not optimized for 
any one particular access mode, but present the 
same look to the processors regardless of the ac- 
cess mode. The average number of collisions in this 
scheme is a function of the number of memories, 
number of processors and number of access ports. 
In one embodiment, a hashing table is utilized for 
the assignment of the routing address to the memo- 
ries. Altematively, a randomizing function is utilized 
to generate a routing address. 
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